Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition

نویسندگان

چکیده

The autoregressive (AR) models, such as attention-based encoder-decoder models and RNN-Transducer, have achieved great success in speech recognition. They predict the output sequence conditioned on previous tokens acoustic encoded states, which is inefficient GPUs. non-autoregressive (NAR) can get rid of temporal dependency between entire one inference step. However, NAR model still faces two major problems. Firstly, there a gap performance advanced AR models. Secondly, it’s difficult for most to train converge. We propose hybrid transformer (HANAT) model, integrates deeply by sharing parameters. assume that will assist learn some linguistic dependencies accelerate convergence. Furthermore, two-stage applied improve performance. All experiments are conducted mandarin dataset ASIEHLL-1 english librispeech-960 h. results show HANAT achieve competitive with outperform many complicated Besides, RTF only 1/5 model.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonlinear mixture autoregressive hidden Markov models for speech recognition

Gaussian mixture models are a very successful method for modeling the output distribution of a state in a hidden Markov model (HMM). However, this approach is limited by the assumption that the dynamics of speech features are linear and can be modeled with static features and their derivatives. In this paper, a nonlinear mixture autoregressive model is used to model state output distributions (...

متن کامل

Statistical Inference in Autoregressive Models with Non-negative Residuals

Normal residual is one of the usual assumptions of autoregressive models but in practice sometimes we are faced with non-negative residuals case. In this paper we consider some autoregressive models with non-negative residuals as competing models and we have derived the maximum likelihood estimators of parameters based on the modified approach and EM algorithm for the competing models. Also,...

متن کامل

Autoregressive HMMs for speech synthesis

We propose the autoregressive HMM for speech synthesis. We show that the autoregressive HMM supports efficient EM parameter estimation and that we can use established effective synthesis techniques such as synthesis considering global variance with minimal modification. The autoregressive HMM uses the same model for parameter estimation and synthesis in a consistent way, in contrast to the stan...

متن کامل

Mixture autoregressive hidden Markov models for speech signals

In this paper a signal modeling technique based upon finite mixture autoregressive probabilistic functions of Markov chains is developed and applied to the problem of speech recognition, particularly speaker-independent recognition of isolated digits. Two types of mixture probability densities are investigated: finite mixtures of Gaussian autoregressive densities (GAM) and nearest-neighbor part...

متن کامل

Autoregressive Models for Image Coding

Recently, the image and video coding community has witnessed several proposals to improve coding efficiency by exploiting perceptual redundancy of texture. Most of these approaches are based on segmentation and non-parametric texture models popular in the computer graphics domain. Although not a generic model for everything we might call texture, the simple (and parametric) autoregressive model...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Signal Processing Letters

سال: 2022

ISSN: ['1558-2361', '1070-9908']

DOI: https://doi.org/10.1109/lsp.2022.3152128